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Abstract 


Sparseness of hidden unit activation is the common effect of the three primary methods 
(unsupervised pre-training, rectifier neural networks, and dropout) that significantly reduce 
overfitting in deep neural networks (DNN) and improve their performance in discriminative 
tasks. Sparsity allows designing a complex DNN, enjoying the benefits of such an expressive 
model while at the same time mitigating the undesired effect of complexity. Only a small subset 
of units is active for each different input. Sparse coding in the brain, the neurophysiological 
equivalent of sparseness of hidden unit activation, allows precise discrimination between similar 
stimuli. In the brain, synaptic pruning is a mechanism which encourages sparse coding and 
essential for learning. In addition, Hebbian unsupervised learning is followed by synaptic 
pruning: synapses that are frequently active together have strong connections between them and 
are maintained, while the rarely used synapses with weak connections are eliminated (pruned). 
Hebbian learning rule is the underlying principle of unsupervised learning in DNN. The purpose 
of this article is to suggest a novel method which is consistent with these neuroscience 
observations and the order observed in the brain. The method includes the following steps: 1) 
start with a very big fully connected DNN with hundreds of nonlinear hidden layers. 2) perform 
fast unsupervised pre-training. 3) prune weights with the smallest absolute value. 4) perform the 
backpropagation algorithm on the trimmed DNN. Besides of the direct effects of this method on 
sparsity and on undesired complexity that causes overfitting. This technique is a strategy to 
design very deep neural networks with hundreds of nonlinear hidden layers while maintaining a 
reduced number of parameters overall in the backpropagation algorithm. Network depth is of 
crucial importance, and this method allows designing a very deep network by keeping a reduced 
number of parameters. Inspired by the human brain, very deep models with hundreds of hidden 
layers can move forward the field of artificial intelligence to a new level of performance. This 
suggested method is consistent not only with the order observed in the brain of unsupervised 
learning followed by synaptic pruning but also with the dynamic structural feature of the brain: 
starting with a very high connectivity and deliberately removing weak synapses, to further 
improve a particular functionality of the brain. 



Introduction 


Deep neural networks (DNNs) are composed of many non-linear hidden layers, and this 
makes DNN expressive models that can potentially leam very complicated relationships between 
their inputs and outputs [13, 23]. For decades, however, DNNs could not be trained to produce 
practical results because of overfitting [1], The overfitting problem is increasingly likely to occur 
as the complexity of the DNN increases [8, 23]. If the DNN has enough hidden units to model 
complicated relationships between its inputs and outputs, there will be many different settings of 
the weights that can model perfectly the relationship in the training data [13]. Each of these 
weight vectors, however, will produce different predictions on test data and will do worse on the 
test data compared to the training data. The weights were adjusted to work well together on the 
training data but not on test data [13]. In other words, the DNN memorizes the training data and 
cannot generalize to new examples and this called overfitting. 

Since the development of the backpropagation algorithm (learning algorithm for DNN) in 
1986, many methods have been developed for reducing overfitting, but they failed in solving this 
problem in a satisfactory way that would have made DNN practical [1,23]. These include early 
stopping, weight decay, weight sharing, and model averaging. Only in 2006, researchers 
discovered a method that improves the performance of DNN [1,14,15] significantly. The 
technique is unsupervised pre-training using contrastive divergence and greedy wise layer 
learning [9,14,15]. The weights of the DNN are initialized in the unsupervised pre-training, and 
further fine tuning using the backpropagation algorithm should be used to improve the model for 
classification [14,15]. Until 2017, two additional main methods were shown to significantly 
reduce overfitting: deep sparse rectifier neural networks and dropout [1,6,10,11,17,21,30]. 
Different theoretical explanations were suggested to explain the efficiency of each of these three 



methods. The efficacy of unsupervised pre-training was explained by suggesting that the input 
vectors contain more information than the labels, and the precious information in the labels only 
used for the fine tuning [14,15]. The efficacy of deep sparse rectifier neural networks was 
explained by advantages of sparsity [10]. The effectiveness of dropout was explained by 
preventing co-adaptation of weights [13]. However, there is one common effect to all these three 
methods: the activations of the hidden units become sparse [10,19,23]. For deep sparse rectifier 
neural networks, this effect stems directly from the properties of the rectifier. Unsupervised pre¬ 
training and dropout, however, are not direct sparsity-inducing regularizes, and still, they 
encourage sparseness of hidden unit activation [19,23]. Sparsity allows designing a very deep 
and complex DNN and enjoying the benefits of such an expressive model while at the same time 
mitigating the undesired effect of complexity. A very small and different subset of units is active 
for each different input. 

Sparse Coding 

Sparse coding is a fundamental neuroscience observation and has become a concept of 
interest in computational neuroscience [7,10]. Sparse coding refers to the neurobiochemical 
phenomenon that each stimulation is encoded by activation of a small set of neurons. For each 
stimulation to be encoded, this is a different subset of all available neurons [7,10]. It was 
introduced in computational neuroscience in the context of sparse coding in the visual, auditory, 
touch, and olfactory systems [5,16,20,29]. Researchers [5,16,20,29] claimed that sparse coding is 
the computational mechanism allowing precise discrimination between similar stimuli. 
Researchers [10] suggested that sparse coding enhances the capacity to of the biological brain to 
perform discriminative tasks by reducing overlap between representations. This neuroscience 
observation are relevant to the performance of DNNs in discriminative tasks [10]. The fact that 



the three primary methods that successfully reduce overfitting induce sparseness of hidden unit 
activation is in accordance with this computational neuroscience principle of sparse coding. 

The purpose of this paper is to suggest a pre-processing procedure for DNN that is based 
on a neurobiochemical process that induces sparsity in the biological brain. A process of synaptic 
pruning. 

Synaptic Pruning 

Synaptic pruning refers to the neurobiochemical process in the brain which includes 
deliberate synapse elimination [2,3]- This process takes place in mammals’ brain mainly between 
early childhood and the onset of puberty [2,4]. Researchers claimed that synaptic pruning is 
influenced by learning and is claimed to represent and support learning [2,3,4,22]. At birth, the 
brain starts with a very high connectivity, and during childhood and adolescence, in the process 
of synaptic pruning, weak synapses are deliberately removed in order to further improve a 
particular network capacity and a specific functionality in the brain. 

Synaptic pruning is determined by synaptic plasticity principles, and Hebbian learning 
rule: synapses that are frequently active together have strong connections between them and are 
maintained while the rarely used synapses with weak connections are eliminated. Researchers 
argued that synaptic pruning removes unnecessary neuronal structures from the brain, reduces 
undesired redundant complexity from the brain, to support further learning [2,3,22]. Important to 
notice the order of synaptic plasticity and Hebbian learning rule followed by synaptic pruning. 
First, Hebbian learning rule determines the synaptic strength and then synaptic pruning 
according to the strength of synapses determined by Hebbian learning rule. As mentioned, in 
DNN, the overfitting problem is the result of increased complexity of DNN. Therefore, synaptic 



pruning enhances learning through two mechanisms: inducing sparsity and removing 
unnecessary neuronal structures. 

Weight Pruning 

Weight pruning is an attempt to implement synaptic pruning in DNN [18,27,28]. 
Researchers [18,27,28] found that weight pruning improves performance of DNNs but not 
significantly enough to solve the overfitting problem. Weight pruning has been implemented 
during the backpropagation algorithm. This article suggests a novel way to implement weight 
pruning, before the backpropagation algorithm, that is consistent with neuroscience 
observations. The suggestion is weight pmning following unsupervised training and before the 
backpropagation algorithm, in order to support learning in subsequent discriminative learning in 
the backpropagation algorithm. 

Method Description 

Method of Pre-processing a Deep Neural Network Which Includes Fast Unsupervised Pre- 
Training Combined with Weight Pruning Before Backpropagation For 
Addressing Overfitting and Designing a Very Deep Neural Network 

As mentioned, in the brain, synaptic plasticity principle of Hebbian learning rule 
followed by synaptic pruning. Hebbian learning rule is the underlying principle of unsupervised 
learning in deep belief networks and deep sigmoid belief networks [9,14,15]. Therefore, the 
suggestion of this article is to follow these synaptic plasticity principles and apply it in DNN. 

The suggestion is to start with a very big fully connected network with multiple nonlinear hidden 
layers and parameters. Perform fast unsupervised pre-training. Prune weights with the smallest 
absolute value. Perform backpropagation algorithm on the trimmed DNN. This suggested 



procedure is consistent with neuroscience observations of an unsupervised learning process 
followed by synaptic pruning to support further learning. The effects of this procedure are 
sparseness of hidden unit activation because of the pruning, and removal of undesired 
complexity of the DNN before the backpropagation algorithm. As mentioned, sparseness of 
hidden unit activation is the common effect of the three primary methods that successfully 
address overfitting, and complexity of DNN causes overfitting in the backpropagation algorithm. 

This suggested pre-processing procedure is consistent with neuroscience observations of 
synaptic pruning following unsupervised learning in the mammals’ brain. This pre-processing 
procedure induces sparsity and reduces undesired complexity of DNN which are essential in 
solving overfitting. 

In addition, this suggested pre-processing procedure allows to design DNN with hundreds 
of hidden layers and maintain a reduced number of paraments present in the backpropagation 
algorithm, and by that this method helps to avoid overfitting while enjoying the benefits of a 
very deep network. Such a DNN can be a very expressive model because of multiple non-linear 
hidden layers but with a reduced number of parameters overall, because of the pruning. Network 
depth is of crucial importance, and very deep models can be very beneficial [11,12]. Inspired by 
the human brain, very deep models with hundreds of hidden layers can move forward the field of 
artificial intelligence to a new level of performance. 

This method suggested in the article is consistent not only with the order observed in the 
brain of unsupervised learning followed by synaptic pruning but also with the structural dynamic 
feature of the brain. Starting at birth, with a very high connectivity and deliberately removing 
weak synapses determined earlier by an unsupervised Hebbian learning rule, to further improve a 
particular network capacity and a specific functionality of the brain 
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